Skip to content

Conversation

@mmakevic-amd
Copy link

@mmakevic-amd mmakevic-amd commented Dec 24, 2025

Motivation

Bi-weekly sync from TensorFlow upstream

Disabled tests:

I reviewed old disabled UTs; some were enabled, and some were moved to the testing scripts excluded list. All details in https://github.com/ROCm/frameworks-internal/issues/14968

Submission Checklist

tensorflower-gardener and others added 30 commits December 18, 2025 02:55
PiperOrigin-RevId: 846167560
…intExpression.

Helps with narrowing down which constraints are unsat. There can be many constraints (e.g. WGMMA in Mosaic), and while debugging it's unclear which one is violated at a glance.

As a follow up, we can also introduce names to each Constraint to make the identification even easier.

PiperOrigin-RevId: 846168559
PiperOrigin-RevId: 846171859
PiperOrigin-RevId: 846173555
…TF normalization in emitters

0) Fix a bug (?) in normalization util when normalized dim contains a single dimension
1) Perform normalization OTF for Transpose emitter selection
2) Use normalized shape for unrolling decision in kLoop emitter
3) Use normalized shape to detect slow transposes in triton fusion rewriter

PiperOrigin-RevId: 846191206
…t.cc

This change updates custom_call_test.cc to dynamically register custom call targets and FFI handlers using the runtime-determined platform name (CUDA or ROCM). This replaces the use of static registration macros, allowing the tests to run correctly across different GPU platforms and the reference interpreter.

This way we can avoid compile time branches like `#ifdef GOOGLE_CUDA` and similar.

Also:

1. Converts usage of raw CUDA driver API functions to StreamExecutor functionality
2. Replaces some legacy CustomCalls by FFI
3. Converts the while test target to HloRunnerPjRt
4. Removes a test case from the Token tests with a nested type in the output type, since that's not supported by our PjRt implementation.

PiperOrigin-RevId: 846196106
The `fd.Size()` check doesn't work when the file descriptor is invalid and only
the path was given.

PiperOrigin-RevId: 846207406
PiperOrigin-RevId: 846213195
PiperOrigin-RevId: 846214738
PiperOrigin-RevId: 846217449
PiperOrigin-RevId: 846221752
The ROCm code path doesn't go through NcclCollectives anymore. Therefore these checks are obsolete.

PiperOrigin-RevId: 846226180
PiperOrigin-RevId: 846226345
PiperOrigin-RevId: 846231902
PiperOrigin-RevId: 846234559
This migrates `builder.create<Op>()` => `Op::create()`

PiperOrigin-RevId: 846246070
This change moves the definition of `AotCompilationResult` into a new header file `compiled_module.h` and renames the class to `CompiledModule`. `CompilationResult` would have been the preferred name, but it's already in-use elsewhere.

The original `AotCompilationResult` is kept as a deprecated alias.

PiperOrigin-RevId: 846246415
…ests, rather than on the original dimensions.

These are simpler both to write and to think about.

No behavior changes are intended.

PiperOrigin-RevId: 846253300
PiperOrigin-RevId: 846257722
… its allocation later

Imported from GitHub PR openxla/xla#35510

📝 Summary of Changes
Initialize collectives pointer to nullptr

🎯 Justification

Gpu runtime options are initialized in TF and transferred to XLA to execute thunks. Since the memory is not cleared collectives point to an uninitialized memory resulting in segfault during nccl collective initialization and operation.

🚀 Kind of Contribution
Please remove what does not apply: 🐛 Bug Fix,

Copybara import of the project:

--
2bfc6fbddbf2f9a926dd504169c56be45d2f1a0a by Harsha HS <[email protected]>:

[ROCm] Initialze collectives to nullptr to force its allocation later

Merging this change closes tensorflow#35510

PiperOrigin-RevId: 846266642
This migrates `builder.create<Op>()` => `Op::create()`

PiperOrigin-RevId: 846268375
…utor_test.

The local_defines for CUDA/ROCM are not required for this test. Added explicit includes for headers used in gpu_executor_test.cc.

PiperOrigin-RevId: 846269233
Imported from GitHub PR openxla/xla#35482

Sometime json incorrectly parse compile commands from bazel, and we end up passing them as

```
"-isystem path/to/includes"
```

to `clangd`, and these flags parsed incorrectly
Copybara import of the project:

--
adf291e21b098d79fa3be4065ee02fafdf5c660a by Eugene Zhulenev <[email protected]>:

Correctly generate compile_commands.json

Merging this change closes tensorflow#35482

PiperOrigin-RevId: 846269357
Depending on the compiler, `testing::TempDir() + __FUNCTION__` may generate and
invalid file name.

PiperOrigin-RevId: 846275995
…iguous send/recv buffers

Imported from GitHub PR openxla/xla#35463

With latest NCCL we can use `ncclAlltoall` API directly without having to launch grouped send and recv operations.
Copybara import of the project:

--
0630f4d48049b211442dcb1754e521a4b1f37f7b by Eugene Zhulenev <[email protected]>:

[xla:gpu] Support ncclAlltoall directly for contiguous send/recv buffers

Merging this change closes tensorflow#35463

PiperOrigin-RevId: 846277559
…is supported by libraries.

PiperOrigin-RevId: 846299624
We can add output pointer to StreamState and it will have all the information for rendezvour. No need to have a separate RendezvousValue struct.

PiperOrigin-RevId: 846313928
For example if we have a fusion

```
dot
bitcast1
...
bad_op
...
bitcast2
...
ROOT root = ...
```

we can still benefit from sinking bitcast2 even though instructions between dot and bad_op will not change.

PiperOrigin-RevId: 846314341
tensorflower-gardener and others added 19 commits December 23, 2025 20:54
PiperOrigin-RevId: 848393091
PiperOrigin-RevId: 848423026
PiperOrigin-RevId: 848434764
PiperOrigin-RevId: 848441651
…stub.

The `xtile_compiler` target now acts as a selector, depending on either `xtile_compiler_impl` or `xtile_compiler_stub` based on whether CUDA or ROCm is configured. The full implementation is moved to the new `xtile_compiler_impl` target, while `xtile_compiler_stub` provides a minimal version for other configurations.

This has the advantage that build_cleaner can run on xtile_compiler_impl. (Doing that removed around 20 dependencies)

PiperOrigin-RevId: 848442213
PiperOrigin-RevId: 848455572
PiperOrigin-RevId: 848467225
PiperOrigin-RevId: 848475361
It has to become a part of Compiler::CompilerOptions, but CompilerOptions should not depend on PJRT. So, moving it here.

PiperOrigin-RevId: 848523186
PiperOrigin-RevId: 848534440
@i-chaochen i-chaochen self-requested a review December 30, 2025 11:53
@i-chaochen
Copy link
Collaborator

This test is failed, seems backend config (h100_sxm) is incorrect

@local_xla//xla/tools:xla_gpu_compile_lib_test_amdgpu_any                FAILED in 13.4s

[2025-12-30T13:08:37.673Z] [ RUN      ] XlaCompileLibTest.CompilesForGpuWithoutDevice
[2025-12-30T13:08:37.673Z] external/local_xla/xla/tools/xla_gpu_compile_lib_test.cc:80: Failure
[2025-12-30T13:08:37.673Z] Value of: (tsl::ReadTextProto(tsl::Env::Default(), target_config_path, &target_config))
[2025-12-30T13:08:37.673Z] Expected: is OK
[2025-12-30T13:08:37.673Z]   Actual: NOT_FOUND: /root/.cache/bazel/_bazel_root/f14ffb85b056b92f87114ec3419b920b/execroot/org_tensorflow/bazel-out/k8-opt/bin/external/local_xla/xla/tools/xla_gpu_compile_lib_test_amdgpu_any.runfiles/org_tensorflow/xla/backends/gpu/target_config/specs/h100_sxm.txtpb; No such file or directory (of type absl::lts_20250814::Status)
[2025-12-30T13:08:37.673Z] 
[2025-12-30T13:08:37.673Z] [  FAILED  ] XlaCompileLibTest.CompilesForGpuWithoutDevice (0 ms)

@mmakevic-amd mmakevic-amd force-pushed the develop-upstream-sync-251224 branch from aeda463 to 9135a29 Compare January 12, 2026 23:58
@mmakevic-amd mmakevic-amd force-pushed the develop-upstream-sync-251224 branch from 9135a29 to b28eff1 Compare January 13, 2026 00:00
@mmakevic-amd
Copy link
Author

This test is failed, seems backend config (h100_sxm) is incorrect

@local_xla//xla/tools:xla_gpu_compile_lib_test_amdgpu_any                FAILED in 13.4s

[2025-12-30T13:08:37.673Z] [ RUN      ] XlaCompileLibTest.CompilesForGpuWithoutDevice
[2025-12-30T13:08:37.673Z] external/local_xla/xla/tools/xla_gpu_compile_lib_test.cc:80: Failure
[2025-12-30T13:08:37.673Z] Value of: (tsl::ReadTextProto(tsl::Env::Default(), target_config_path, &target_config))
[2025-12-30T13:08:37.673Z] Expected: is OK
[2025-12-30T13:08:37.673Z]   Actual: NOT_FOUND: /root/.cache/bazel/_bazel_root/f14ffb85b056b92f87114ec3419b920b/execroot/org_tensorflow/bazel-out/k8-opt/bin/external/local_xla/xla/tools/xla_gpu_compile_lib_test_amdgpu_any.runfiles/org_tensorflow/xla/backends/gpu/target_config/specs/h100_sxm.txtpb; No such file or directory (of type absl::lts_20250814::Status)
[2025-12-30T13:08:37.673Z] 
[2025-12-30T13:08:37.673Z] [  FAILED  ] XlaCompileLibTest.CompilesForGpuWithoutDevice (0 ms)

This is a deviceless test, the problem was in file path. Fixed in 3a69036

@mmakevic-amd
Copy link
Author

Hi @i-chaochen can we merge this?

Copy link
Collaborator

@i-chaochen i-chaochen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Yes, please merge it and be remember to push the tag.

@mmakevic-amd mmakevic-amd merged commit 1d673b5 into develop-upstream Jan 14, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.